How to Train Object Detection Net

Yolov3 is a real time multiple object detector that follows single shot detection architecture. The below training steps are compiled following the instructions at AlexeyAB-Darknet

Labeling

We label each object on images from a certain dataset with the visual GUI-software for marking bounded boxes of objects AlexeyAB-Yolo_mark and generating annotation files. The convention is as follows:

<object-class> <x_center> <y_center> <width> <height>
<object-class> - integer object number from 0 to (classes-1)
<x> = <absolute_x> / <image_width> or <height> = <absolute_height> / <image_height>
<x_center> <y_center> - are center of rectangle (are not top-left corner)

Create file train.txt in directory build\darknet\x64\data\, with filenames of your images (relative image path). This file will be generated automatically if you use AlexeyAB-Yolo_mark to annotate your data.
Download pre-trained weights for the convolutional layers (154 MB): and put to the directory builddarknetx64.

Optimize anchors

Recalculate anchors for your dataset for width and height from cfg-file: ./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416

then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file.

But you should change indexes of anchors masks= for each [yolo]-layer, so that 1st-[yolo]-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. Also you should change the filters=(classes + 5)*<number of mask> before each [yolo]-layer. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.

Training

Clone the repository and build using cmake . && make.
Create file yolo-obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo-obj.cfg)
Change line batch to batch=64
Change line subdivisions to subdivisions=8
Change line max_batches to (classes*2000), f.e. max_batches=6000 if you train for 3 classes
Change line steps to 80% and 90% of max_batches, f.e. steps=4800,5400
Change line classes=80 to your number of object classes in each of 3 [yolo]-layers:
Change [filters=255] to filters=(classes + 5)x3 in the 3 [convolutional] before each [yolo] layer (if classes=2 then write filters=21)
Set flag random=1 in your .cfg-file - it will increase precision by training Yolo for different resolutions
Create file obj.names in the directory build\darknet\x64\data\, with objects names - each in new line
Create file obj.data in the directory build\darknet\x64\data\, containing below;

classes= 2
train  = data/train.txt
valid  = data/test.txt
names = data/obj.names
backup = backup/

Put image-files (.jpg) of your objects in the directory build\darknet\x64\data\obj\
To train on command: ./darknet detector train data/obj.data yolo-obj.cfg darknet53.conv.74. File yolo-obj_last.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations.
After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just start training using: ./darknet detector train data/obj.data yolo-obj.cfg backup\yolo-obj_2000.weights

Note: If during training you see nan values for avg (loss) field - then training goes wrong, but if nan is in some other lines - then training goes well.

Note: If you changed width= or height= in your cfg-file, then new width and height must be divisible by 32.

Note: if error Out of memory occurs then in .cfg-file you should increase subdivisions=16, 32 or 64:

Stop Training

During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg:

Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8
9002: 0.211667, 0.060730 avg, 0.001000 rate, 3.868000 seconds, 576128 images Loaded: 0.000000 seconds

9002 - iteration number (number of batch) 0.060730 avg - average loss (error) - the lower, the better

Once training is stopped, you should take some of last .weights-files from darknet\build\darknet\x64\backup and choose the best of them. For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting.
At first, in your file obj.data you must specify the path to the validation dataset valid = valid.txt (format of valid.txt as in train.txt), and if you haven’t validation images, just copy data\train.txt to data\valid.txt. However it’s best to first divide the label data to train and valid sets.
If training is stopped after 9000 iterations, to validate some of previous weights use this commands:

darknet detector map data/obj.data yolo-obj.cfg backup\yolo-obj_9000.weights
darknet detector map data/obj.data yolo-obj.cfg backup\yolo-obj_8000.weights
darknet detector map data/obj.data yolo-obj.cfg backup\yolo-obj_7000.weights

Choose weights-file with the highest mAP (mean average precision) or IoU (intersect over union)

After Training

To increase network resolution in your .cfg-file (height=608, width=608 or any value multiple of 32) - it will increase precision at higher running time.